Developing autonomous vehicles (AVs) helps improve the road safety and traffic efficiency of intelligent transportation systems (ITS). Accurately predicting the trajectories of traffic participants is essential to the decision-making and motion planning of AVs in interactive scenarios. Recently, learning-based trajectory predictors have shown state-of-the-art performance in highway or urban areas. However, most existing learning-based models trained with fixed datasets may perform poorly in continuously changing scenarios. Specifically, they may not perform well in learned scenarios after learning the new one. This phenomenon is called "catastrophic forgetting". Few studies investigate trajectory predictions in continuous scenarios, where catastrophic forgetting may happen. To handle this problem, first, a novel continual learning (CL) approach for vehicle trajectory prediction is proposed in this paper. Then, inspired by brain science, a dynamic memory mechanism is developed by utilizing the measurement of traffic divergence between scenarios, which balances the performance and training efficiency of the proposed CL approach. Finally, datasets collected from different locations are used to design continual training and testing methods in experiments. Experimental results show that the proposed approach achieves consistently high prediction accuracy in continuous scenarios without re-training, which mitigates catastrophic forgetting compared to non-CL approaches. The implementation of the proposed approach is publicly available at https://github.com/BIT-Jack/D-GSM
translated by 谷歌翻译
广告视频编辑旨在将广告视频自动编辑为较短的视频,同时保留广告商传达的连贯内容和关键信息。它主要包含两个阶段:视频细分和段组合。现有方法在视频分割阶段表现良好,但遭受了对额外繁琐模型的依赖性问题,并且在细分组合阶段的性能差。为了解决这些问题,我们提出了M-SAN(多模式段组合网络),该网络可以执行高效且连贯的段组合任务。它利用从段中提取的多模式表示形式,并遵循带有注意机制的编码器ptr-decoder ptr-net框架。重要性补偿奖励是为培训M-SAN设计的。我们在广告客户收集的丰富广告方案下,在ADS-1K数据集上使用1000多个视频进行实验。为了评估这些方法,我们提出了一个统一的imp-coh@Time,该指标可以全面评估同时评估产出的重要性,相干性和持续时间。实验结果表明,我们的方法比随机选择和公制上的先前方法更好的性能。消融实验进一步验证了多模式表示和重要性互动的奖励可显着改善性能。 ADS-1K数据集可用:https://github.com/yunlong10/ads-1k
translated by 谷歌翻译
核分型是评估染色体异常可能存在的重要程序。但是,由于非刚性性质,染色体通常在微观图像中弯曲,这种变形形状阻碍了细胞遗传学家的染色体分析。在本文中,我们提出了一个自我发项的指导框架,以消除染色体的曲率。提出的框架提取空间信息和本地纹理,以在回归模块中保留带模式。借助弯曲染色体的互补信息,改进模块旨在进一步改善细节。此外,我们提出了两个专用的几何约束,以维持长度并恢复染色体的变形。为了训练我们的框架,我们创建一个合成数据集,其中通过网格变形从现实世界的直染色体生成弯曲的染色体。定量和定性实验是对合成和现实世界数据进行的。实验结果表明,我们所提出的方法可以有效拉直弯曲的染色体,同时保持带的细节和长度。
translated by 谷歌翻译
在为临床应用设计诊断模型时,至关重要的是要确保模型在各种图像损坏方面的稳健性。在此,建立了易于使用的基准,以评估神经网络在损坏的病理图像上的性能。具体而言,通过将九种类型的常见损坏注入验证图像来生成损坏的图像。此外,两个分类和一个排名指标旨在评估腐败下的预测和信心表现。在两个结果的基准数据集上进行了评估,我们发现(1)各种深神经网络模型的准确性降低(两倍是清洁图像上的误差的两倍)和对损坏图像的不可靠置信度估计; (2)验证和测试错误之间的相关性较低,同时用我们的基准替换验证集可以增加相关性。我们的代码可在https://github.com/superjamessyx/robustness_benchmark上找到。
translated by 谷歌翻译
The goal of multimodal abstractive summarization (MAS) is to produce a concise summary given the multimodal data (text and vision). Existing studies on MAS mainly focus on how to effectively use the extracted visual features, having achieved impressive success on the high-resource English dataset. However, less attention has been paid to the quality of the visual features to the summary, which may limit the model performance especially in the low- and zero-resource scenarios. In this paper, we propose to improve the summary quality through summary-oriented visual features. To this end, we devise two auxiliary tasks including \emph{vision to summary task} and \emph{masked image modeling task}. Together with the main summarization task, we optimize the MAS model via the training objectives of all these tasks. By these means, the MAS model can be enhanced by capturing the summary-oriented visual features, thereby yielding more accurate summaries. Experiments on 44 languages, covering mid-high-, low-, and zero-resource scenarios, verify the effectiveness and superiority of the proposed approach, which achieves state-of-the-art performance under all scenarios.
translated by 谷歌翻译
Given a document in a source language, cross-lingual summarization (CLS) aims at generating a concise summary in a different target language. Unlike monolingual summarization (MS), naturally occurring source-language documents paired with target-language summaries are rare. To collect large-scale CLS samples, existing datasets typically involve translation in their creation. However, the translated text is distinguished from the text originally written in that language, i.e., translationese. Though many efforts have been devoted to CLS, none of them notice the phenomenon of translationese. In this paper, we first confirm that the different approaches to constructing CLS datasets will lead to different degrees of translationese. Then we design systematic experiments to investigate how translationese affects CLS model evaluation and performance when it appears in source documents or target summaries. In detail, we find that (1) the translationese in documents or summaries of test sets might lead to the discrepancy between human judgment and automatic evaluation; (2) the translationese in training sets would harm model performance in the real scene; (3) though machine-translated documents involve translationese, they are very useful for building CLS systems on low-resource languages under specific training strategies. Furthermore, we give suggestions for future CLS research including dataset and model developments. We hope that our work could let researchers notice the phenomenon of translationese in CLS and take it into account in the future.
translated by 谷歌翻译
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German. Based on the Transformer, we apply several effective variants. In our experiments, we utilize the pre-training-then-fine-tuning paradigm. In the first pre-training stage, we employ data filtering and synthetic data generation (i.e., back-translation, forward-translation, and knowledge distillation). In the second fine-tuning stage, we investigate speaker-aware in-domain data generation, speaker adaptation, prompt-based context modeling, target denoising fine-tuning, and boosted self-COMET-based model ensemble. Our systems achieve 0.810 and 0.946 COMET scores. The COMET scores of English-German and German-English are the highest among all submissions.
translated by 谷歌翻译
隐式神经表示显示了3D场景重建的有希望的潜力。最近的工作将其应用于自主3D重建,通过学习信息获得图路径计划的信息增益。有效,信息增益的计算很昂贵,并且与使用体积表示相比,使用隐式表示为3D点进行碰撞检查要慢得多。在本文中,我们建议1)利用神经网络作为信息增益场的隐式函数近似器,以及2)将隐式细粒表示与粗量表示形式结合起来,以提高效率。随着效率的提高,我们提出了基于基于图的计划者的新型信息路径计划。我们的方法表明,与具有隐性和明确表示的自主重建相比,重建质量和计划效率的显着提高。我们将该方法部署在真正的无人机上,结果表明我们的方法可以计划信息意见并以高质量重建场景。
translated by 谷歌翻译
谷仓(基准自动驾驶机器人导航)挑战在宾夕法尼亚州费城的2022年IEEE国际机器人和自动化国际会议(ICRA 2022)举行。挑战的目的是评估最先进的自动地面导航系统,以安全有效的方式将机器人通过高度约束的环境移动。具体而言,任务是将标准化的差分驱动地面机器人从预定义的开始位置导航到目标位置,而不会与模拟和现实世界中的任何障碍相撞。来自世界各地的五支球队参加了合格的模拟比赛,其中三支受邀在费城会议中心的一组身体障碍课程中相互竞争。竞争结果表明,尽管表面上显得简单,即使对于经验丰富的机器人主义者来说,在高度约束空间中的自主地面导航实际上远非解决问题。在本文中,我们讨论了挑战,前三名获胜团队所使用的方法以及学到的教训以指导未来的研究。
translated by 谷歌翻译
图像文本检索(ITR)在桥接视觉和舌形式方面具有挑战性。对比度学习已被大多数先前的艺术所采用。除了有限的负面图像文本对外,约束学习的能力受到手动加权负对以及对外部知识的不认识的限制。在本文中,我们提出了新型耦合多样性敏感的动量约束学习(编码器),以改善跨模式表示。首先,发明了一种新颖的多样性对比度学习(DCL)体系结构。我们引入了两种模式的动态词典,以扩大图像文本对的比例,并且通过自适应负面对加权实现多样性敏感性。此外,编码器设计了两个分支。一个人从图像/文本中学习实例级的嵌入式,它还基于其嵌入为其输入图像/文本生成伪在线聚类标签。同时,另一个分支学会从常识知识图中查询以形成两种模式的概念级描述符。之后,两个分支都利用DCL来对齐跨模式嵌入空间,而额外的伪聚类标签预测损失则用于促进第二个分支的概念级表示学习。在两个流行的基准测试(即Mscoco和Flicker30k)上进行的广泛实验,验证编码器的表现明显优于最先进的方法。
translated by 谷歌翻译